Skip to content

Add IFBench RLVR reward helpers#28

Open
nam157 wants to merge 2 commits into
allenai:mainfrom
nam157:codex/ifbench-rlvr-reward
Open

Add IFBench RLVR reward helpers#28
nam157 wants to merge 2 commits into
allenai:mainfrom
nam157:codex/ifbench-rlvr-reward

Conversation

@nam157

@nam157 nam157 commented May 20, 2026

Copy link
Copy Markdown

Summary

  • add reward_lib.py for scoring prompt/response pairs with the existing IFBench verifiers
  • expose a batch make_reward_fn(...) helper for RLVR trainers plus structured RewardResult output for debugging reward shaping
  • add run_reward.py as a reproducible local reward smoke runner over prompt/response jsonl files
  • document minimal reward-function and smoke-runner examples, with focused tests

Why

The Algora IF-RLVR/Bench bounty calls out a train-oriented integration path for IFBench. The current repository has evaluation scripts, but no small reusable reward function that a training loop can call directly. This keeps the change lightweight by reusing the existing strict/loose verifier implementations and adding a CLI smoke path to prove dataset loading plus reward scoring end to end.

Context: Prime Intellect IF-RLVR/Bench Algora bounty: https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4

Validation

  • uv run pytest -q reward_lib_test.py
  • uv run pytest -q
  • uv run python -m run_reward --input_data=data/IFBench_test.jsonl --input_response_data=data/sample_output.jsonl --mode=loose --limit=5
  • smoke-tested reward_lib.make_reward_fn(...) against data/IFBench_test.jsonl

@nam157

nam157 commented May 21, 2026

Copy link
Copy Markdown
Author

Small review note: this is a focused reward-function integration for the IF-RLVR/Bench bounty, reusing the existing IFBench verifiers rather than changing evaluator semantics.

Validation run locally:

  • uv run pytest -q reward_lib_test.py
  • uv run pytest -q
  • uv run python -m run_reward --input_data=data/IFBench_test.jsonl --input_response_data=data/sample_output.jsonl --mode=loose --limit=5

Happy to adjust the API shape if maintainers prefer a different trainer-facing entry point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant